10 research outputs found
Additional file 1: of Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers
Proteome fasta files for the following yeast species: S. cerevisiae, C. glabrata, K. waltii and K lactis. (ZIP 5844Â kb
Exploring the Adenylation Domain Repertoire of Nonribosomal Peptide Synthetases Using an Ensemble of Sequence-Search Methods
<div><p>The introduction of two-dimension (2D) graphs and their numerical characterization for comparative analyses of DNA/RNA and protein sequences without the need of sequence alignments is an active yet recent research topic in bioinformatics. Here, we used a 2D artificial representation (four-color maps) with a simple numerical characterization through topological indices (TIs) to aid the discovering of remote homologous of Adenylation domains (A-domains) from the Nonribosomal Peptide Synthetases (NRPS) class in the proteome of the cyanobacteria <i>Microcystis aeruginosa</i>. Cyanobacteria are a rich source of structurally diverse oligopeptides that are predominantly synthesized by NPRS. Several A-domains share amino acid identities lower than 20 % being a possible source of remote homologous. Therefore, A-domains cannot be easily retrieved by BLASTp searches using a single template. To cope with the sequence diversity of the A-domains we have combined homology-search methods with an alignment-free tool that uses protein four-color-maps. <b>TI2BioP</b> (<b>T</b>opological <b>I</b>ndices <b>to</b><b>BioP</b>olymers) <i>version 2.0</i>, available at <a href="http://ti2biop.sourceforge.net/" target="_blank">http://ti2biop.sourceforge.net/</a> allowed the calculation of simple TIs from the protein sequences (four-color maps). Such TIs were used as input predictors for the statistical estimations required to build the alignment-free models. We concluded that the use of graphical/numerical approaches in cooperation with other sequence search methods, like multi-templates BLASTp and profile HMM, can give the most complete exploration of the repertoire of highly diverse protein families.</p></div
Testing different topologies for the MLP on the A-domain classification using TIs from four-color maps.
<p>Accuracy performance and error on training, selection and test sets.</p
From the protein sequence to its numerical characterization.
<p>(A) The first nine aminoacids of pdb 1AMU. (B and C) Building the four-color map for A. (D) The definition of the node adjacency matrix derived from C the four-color map.</p
True positives <i>vs</i>. false positives in the A-domain detection for different sequence-search methods among the overall dataset involved in the study.
<p>True positives <i>vs</i>. false positives in the A-domain detection for different sequence-search methods among the overall dataset involved in the study.</p
Classification results for the three alignment-free models (GDA, DTM and ANN) in A-domains detection.
<p>Classification results for the three alignment-free models (GDA, DTM and ANN) in A-domains detection.</p
Architecture for the DTM. Decision Nodes are represented in blue and terminal nodes are in red.
<p>A-domains are labeled using an intermittent line. Otherwise CATH domains are signed by a continuous line. Labels at the right-corner of the nodes indicate tentative membership to A or CATH domain class. Numbers at the left-corner represent the node's number.</p
Assessing the relationship between the number of TIs entered in each model and the Wilk's (λ) values obtained for each one.
<p>Assessing the relationship between the number of TIs entered in each model and the Wilk's (λ) values obtained for each one.</p
Dot plot for the global sequence identity matrix obtained by Needleman-Wunsch algorithm for A-domains.
<p>(A) All A-domains involved in the study. (B) A-domains of the test set.</p
Re-annotation of the A-domains in the proteome of <i>Microcystis aeruginosa</i> by using an ensemble of algorithms.
<p>Re-annotation of the A-domains in the proteome of <i>Microcystis aeruginosa</i> by using an ensemble of algorithms.</p